4
Introduction
this may cause significant errors in quantization. Before introducing new methods to improve
the quantization process, we highlight the notations used in XNOR-Net [199] that will be
used in our discussions. For each layer in a CNN, I is the input, W is the weight filter, B is
the binarized weight (+ −1), and H is the binarized input.
Rastegari et al. [199] propose Binary-Weight-Networks (BWN) and XNOR-Networks.
BWN approximates the weights with binary values, a variation of a BNN. XNOR-Networks
binarize both the weights and activation bits and is considered a 1-bit network. Both net-
works use the idea of a scaling factor. In BWN, the real-valued weight filter W is estimated
using a binary filter B and a scaling factor α. The convolutional operation is then approxi-
mated by:
I ∗W ≈(I ⊕B)α,
(1.6)
where ⊕indicates a convolution without multiplication. By introducing the scaling factor,
binary weight filters reduce memory usage by a factor of 32× compared to single precision
filters. To ensure W is approximately equal to αB, BWN defines an optimization problem,
and the optimal solution is:
B∗= sign(W),
(1.7)
α∗= W T sign(W)
n
=
|Wi|
n
= 1
n∥Wr∥l1.
(1.8)
Therefore, the optimal estimation of a binary weight filter can be achieved by taking
the sign of weight values. The optimal scaling factor is the absolute average of the absolute
weight values. The scaling factor is also used to calculate the gradient in backpropagation.
The core idea of XNOR-Net is the same as BWN, but another scaling factor, β, is used
when binarizing the input I into H. As the experiments show, this approach outperforms
BinaryConnect and BNN by a large margin on ImageNet. Unlike the XNOR-Net, which
sets the mean weights to the scaling factor, Xu et al. [266] define a trainable scaring fac-
tor for both weights and activations. LQ-Nets [284] quantize both weights and activations
with arbitrary bit-widths, including 1-bit. The learnability of the quantizers makes them
compatible with bitwise operations to keep the fast inference merit of properly quantized
neural networks (QNNs).
Based on XNOR-Net [199], the High-Order Residual Quantization (HORQ) [138] pro-
vides a high-order binarization scheme, which achieves a more accurate approximation while
still having the advantage of binary operations. HORQ calculates the residual error and then
performs a new round of thresholding operations to approximate the residual further. This
binary approximation of the residual can be considered a higher-order binary input. Follow-
ing XNOR, HORQ defines the first-order residual tensor R1(x) by computing the difference
between the real input and the first-order binary quantization:
R1(x) = X −β1H1 ≈β2H2,
(1.9)
where R1(x) is a real value tensor. By this analogy, R2(x) can be seen as the second-order
residual tensor, and β3H3 also approximates it. After recursively performing the above
operations, they obtain order-K residual quantization:
X =
K
i=1
βiHi.
(1.10)
During the training of the HORQ network, the input tensor can be reshaped into a
matrix and expressed as any order residual quantization. Experiments show that HORQ-
Net outperforms XNOR-Net in accuracy in the CIFAR dataset.